Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule

نویسنده

  • Bradley Efron
چکیده

A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? The traditional answer to this question is given by cross-validation. The cross-validation estimate of prediction error is nearly unbiased, but can be highly variable. This article discusses bootstrap estimates of prediction error, which can be thought of as smoothed versions of cross-validation. A particular bootstrap method, the 632+ rule, is shown to substantially outperform cross-validation in a catalog of 24 simulation experiments. Besides providing point estimates, we also consider estimating the variability of an error rate estimate. All of the results here are nonparametric, and apply to any possible prediction rule: however we only study classiication problems with 0-1 loss in detail. Our simulations include \smooth" prediction rules like Fisher's Linear Discriminant Function, and unsmooth ones like Nearest Neighbors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selection of Tree-based Classifiers with the Bootstrap 632+ Rule

15 wardly biased as with the standard bootstrap. The use of the bootstrap 632+ rule for model evaluation in this problem constitutes a rst real application of the method. Model prediction error for the selected CART compared favorably with that estimated by Fisher's linear discriminant function. Acknowledgments Collaboration with C. Chemini and the Centro di Ecologia Alpina of Trento is gratefu...

متن کامل

Selection bias in gene extraction on the basis of microarray gene-expression data.

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only...

متن کامل

A comparison of bootstrap methods and an adjusted bootstrap approach for estimating prediction error in microarray classification Short title: Bootstrap Prediction Error Estimation

SUMMARY This paper first provides a critical review on some existing methods for estimating prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimen. Special attention is given to the bootstrap-related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We intr...

متن کامل

Estimating misclassification error with small samples via bootstrap cross-validation

MOTIVATION Estimation of misclassification error has received increasing attention in clinical diagnosis and bioinformatics studies, especially in small sample studies with microarray data. Current error estimation methods are not satisfactory because they either have large variability (such as leave-one-out cross-validation) or large bias (such as resubstitution and leave-one-out bootstrap). W...

متن کامل

Fast robust estimation of prediction error based on resampling

Robust estimators of the prediction error of a linear model are proposed. The estimators are based on the resampling techniques cross-validation and bootstrap. The robustness of the prediction error estimators is obtained by robustly estimating the regression parameters of the linear model and by trimming the largest prediction errors. To avoid the recalculation of timeconsuming robust regressi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995